Phylogenetic placement and comparative analysis of the mitochondrial genomes of Idiostoloidea (Hemiptera: Heteroptera)

Abstract The classification system and the higher level phylogenetic relationships of Pentatomomorpha, the second largest infraorder of Heteroptera (Insecta: Hemiptera), have been debated and remain controversial over decades. In particular, the placement and phylogenetic relationship of Idiostoloidea are not well resolved, which hampers a better understanding of the evolutionary history of Pentatomomorpha. In this study, for the first time, we reported the complete mitochondrial genome for two narrowly distributed families of Idiostoloidea (including Idiostolidae and Henicocoridae), respectively. The length of the mitochondrial genome of Monteithocoris hirsutus and Henicocoris sp. is 16,632 and 16,013 bp, respectively. The content of AT is ranging from 75.15% to 80.48%. The mitogenomic structure of Idiostoloidea is highly conservative and there are no gene arrangements. By using the Bayesian inference, maximum likelihood, and Bayesian site‐heterogeneous mixture model, we inferred the phylogenetic relationships within Pentatomomorpha and estimated their divergence times based on concatenated mitogenomes and nuclear ribosomal genes. Our results support the classification system of six superfamilies within Pentatomomorpha and confirm the monophyletic groups of each superfamily, with the following phylogenetic relationships: (Aradoidea + (Pentatomoidea + (Idiostoloidea + (Coreoidea + (Pyrrhocoroidea + Lygaeoidea))))). Furthermore, estimated divergence times revealed that most pentatomomorphan superfamilies and families diverged during the Late Jurassic to Early Cretaceous, which coincides with the explosive radiation of angiosperms.

Therefore, it is important to investigate the placement and phylogenetic relationship of Idiostoloidea within the Pentatomomorpha.

Phylogenetic analyses of the superfamily or family level within
Pentatomomorpha have been the subject of many studies based on molecular and morphological data, but relatively few studies have addressed the relationship of Idiostoloidea (Figure 1).The main reason for this may be due to the restricted distribution (southern South America and Australia) of Idiostoloidea and high difficulty in collection as the knowledge of biology of these organisms is limited (Cassis & Gross, 2002;Schuh & Slater, 1995).Henry (1997) proposed a sister group relationship between Idiostoloidea and Lygaeoidea, together with Coreoidea and Pyrrhocoroidea formed a monophyletic group based on morphological characters (Figure 1a).Weirauch et al. (2019) recently provided a comprehensive analysis of Heteroptera using molecular sequence data (partial sequence of 16S rDNA, 18S rDNA, and 28S rDNA) combined with morphological data treated Idiostoloidea as the sister group of the remaining Trichophora (including the superfamilies of Pentatomomorpha without Aradoidea) (Figure 1f), but the support value for this grouping is very low (bootstrap values lower than 50%), which indicated that these related nodes (Idiostoloidea, Pentatomoidea, and the rest of the Trichophora) are likely to be left unresolved, while increasing the number of characters may improve the support and resolution of phylogenetic inference (Delsuc et al., 2005;Wortley et al., 2005).
In the phylogenomic era, the mitogenome has become the most extensively used genomic resource in insects (Cameron, 2014), owing to its compact size (typically 15-18 kb), as well as their high copy number within the cell, which allows using the cost-effective approach of shotgun assembly based on a relatively shallow next generation sequencing without any tedious PCR amplification (Kocher et al., 2014).However, mitogenome data alone have limited applicability in higher level phylogenetic inference of insects, because of these biases in accelerated substitution rates and compositional heterogeneity (Yang et al., 2018).On the other hand, nuclear ribosomal genes have been successfully used to study the phylogenetic relationships of various taxonomic levels (e.g., family level), due TA B L E 1 Previous classification systems of Pentatomomorpha.

| Taxon sampling
A total of 28 species of Heteroptera were sampled in this study, including 25 species from Pentatomomorpha as ingroups and three species from Leptopodomorpha and Cimicomorpha as outgroups (Table S1).The ingroups covered all six extant superfamilies of Pentatomomorpha.Among these, two mitogenomes and complete nuclear ribosomal sequences (18S rRNA, ITS1, 5.8S rRNA, ITS2, and 28S rRNA) of Idiostoloidea, and four mitogenomes of Lygaeoidea are reported here for the first time.All these newly sequenced species represent six different families, with the mitogenome of the rarely collected family of Henicocoridae being first reported here.
The collecting information of six newly sequenced species is provided in Table S2, and specimens were preserved in 95% ethanol  S1.

| Sequencing and assembly
We extracted genomic DNA from thoracic muscle tissue using the DNeasy DNA Extraction kit (QIAGEN).Four mitogenomes of Lygaeoidea were generated by the amplification of four overlapping PCR fragments, and the primers used for amplification are listed in Table S3.Details of the PCR reaction conditions were described in our previous study (Li, Yang, et al., 2016) and then sequenced by ABI 3730XL capillary sequencer (Applied Biosystems).Raw sequence files were proofread and then assembled into contigs using BioEdit version 7.0.5.2 (Hall, 1999).Two entire mitogenomes of Idiostoloidea were obtained using Illumina HiSeq 2000 platform (Illumina, San Diego, CA) with 200 bp insert size and a paired-end 100-bp sequencing strategy at BGI-Shenzhen, China.The sequence reads were first assessed with FastQC (Babraham Bioinformatics), adapter sequences and low-quality reads were trimmed with Trimmomatic (Bolger et al., 2014), and then the remaining high-quality reads were assembled using SOAPdenovo-Trans (Xie et al., 2014).The resulting contigs were subsequently used as query in a blastn (Camacho et al., 2009) search against the reference mitogenomes and only those with a high hit score were retained as putative mitochondrial genomes.Additionally, the complete nuclear ribosomal gene sequences of Idiostoloidea were detected by using the assembled contigs BLAST against the 18S and 28S rDNAs of Eurydema maracandica (Yu et al., 2013).

| Annotation and alignment
The assembled contigs and scaffolds were annotated for proteincoding genes (PCGs) and rRNAs by BLAST searches of GenBank and alignment with homologous sequences, while tRNAs were identified according to tRNAscan-SE version 1.21 (Lowe & Eddy, 1997).
Each PCG was aligned separately based on amino acid translation using MAFFT (Katoh et al., 2002) as implemented in the TranslatorX online server (Abascal et al., 2010).Divergent and ambiguously aligned regions were removed from the protein alignment before back-translating to nucleotides using Gblocks (Talavera & Castresana, 2007) within TranslatorX.The rRNA genes were individually aligned in MAFFT v7.0 online server (Katoh & Standley, 2013) using the G-INS-i algorithm.All aligned nucleotide sequences were checked manually in MEGA 7 (Kumar et al., 2016), and then concatenated to reconstruct the phylogeny excluding stop codons.The codon usage, nucleotide composition, and amino acid composition were calculated with MEGA 7 (Kumar et al., 2016).The formulas "AT skew = (A − T)/(A + T)" and "GC skew = (G − C)/(G + C)" were used to calculate strand asymmetry (Perna & Kocher, 1995).

| Phylogenetic analyses
AliGROOVE (Kück et al., 2014) was employed to detect the heterogeneous sequence divergence within the alignment (e.g., each codon position of PCGs), while the third codon position of the PCGs showed strong sequence heterogeneity with mainly negative pairwise similarity scores (Figure S1).Thus, phylogenetic analyses were performed with three data matrices excluding the third codon and stop codon positions: (1) PCG12 matrix including the first and the second codon positions of the 13 PCGs; (2) PCG12rDNA matrix, including the PCG12 dataset with two rDNAs of 18S and 28S; (3) PCG12RNArDNA matrix, including the PCG12rDNA dataset with two rRNA genes of 16S and 12S.PartitionFinder 2 (Lanfear et al., 2017) was then used to detect the best partitioning schemes and corresponding nucleotide substitution models.The best-fit partitioning schemes for these three datasets proposed by PartitionFinder 2 were utilized in the subsequent phylogenetic analyses (Table S4).
We used GPU MrBayes (Zhou et al., 2011) and PhyloBayes MPI version 1.7 (Lartillot et al., 2013) for Bayesian inference (BI), and raxmlGUI 1.2 (Silvestro & Michalak, 2012) for maximum likelihood (ML) analyses to reconstruct phylogenetic trees.MrBayes analysis was carried out with two simultaneous runs of 10,000,000 generations conducted for each matrix.Each set was sampled every 100 generations.In PhyloBayes analysis, we use the site-heterogeneous model CAT + GTR, which has been suggested to reduce the artifacts resulting from mutational saturation and unequal patterns of substitution (Lartillot & Philippe, 2004;Song et al., 2016).Two independent chains were run in parallel until the analyses satisfactorily converged (maxdiff less than 0.3).For these two BI methods, trees that were sampled prior to stationarity (at 25% of the run) were discarded as burn-in, and the remaining trees were used to construct a 50% majority-rule consensus tree.For the ML analyses, GTR + I + Γ model was used for each matrix, and the node support was assessed with 1000 bootstrap replicates.

| Molecular dating
We analyzed the concatenated DNA sequence alignment of PCG12RNArDNA using BEAST 1.8.0 under the uncorrelated lognormal relaxed clock model (Drummond & Rambaut, 2007).A Yule speciation process was used for the tree prior (Gernhard, 2008).
The molecular clock was calibrated by six fossils with minimum age constraints, coupled with exponential priors on node times were used for these fossil calibrations (Table S5).Two replicate Markov chain Monte Carlo (MCMC) runs were performed with the tree and parameter values sampled every 1000 steps over a total of 50 million generations.Acceptable effective sample sizes (ESS) and convergence to the stationary distribution were checked by Tracer 1.7 (Rambaut et al., 2018).A maximum clade credibility tree was obtained from TreeAnnotator (Bouckaert et al., 2014) with a burn-in of the first 10% of trees.

| Mitogenome comparisons
The complete mitochondrial genome of Monteithocoris hirsutus and Henicocoris sp. is circular double-stranded molecules, which span 16,632 and 16,013 bp, respectively (Figure 2), matching very well with the ancestral gene arrangement of Drosophila yakuba (Clary & Wolstenholme, 1985).The nucleotide composition of Monteithocoris hirsutus and Henicocoris sp. is significantly AT biased ranging from 75.15% to 80.48% (Table 2), which is congruent with the rich A + T composition in insect mitogenomes (Cameron, 2014).J-strand tRNAs, and rRNAs is positive, indicating a higher content of G nucleotides than C nucleotides, while GC skew of the whole genome and J-strand PCGs are negative.The differences in GC skew between Monteithocoris hirsutus and Henicocoris sp. are observed in tRNAs, N-strand tRNAs, and the control region.This kind of strand bias of nucleotide composition is likely related to asymmetric mutation processes during replication (Hassanin et al., 2005).
The mitogenome-wide bias toward A + T is also reflected in the codon usage.We summarized the relative synonymous codon usage (RSCU) values for the mitogenomes of Monteithocoris hirsutus and Henicocoris sp.(Figure 4).According to the results of RSCU, synonymous codons ending with an A or U are more prevalent than those ending in a G or C, which were also observed in other heteropteran species (Li et al., 2013;Wang et al., 2016).For instance, UUU (RSCU = 1.67) of Monteithocoris hirsutus is more common than UUC (RSCU = 0.33) for Phe, which is the same with Henicocoris sp.(the RSCU of UUU is 1.79 while UUC is 0.21).Four most frequently used codons are AUA-Met, AUU-Ile, UUA-Leu2, and UUU-Phe (Figure 5), which are all composed wholly of A or U, which may also reflect the A + T bias of the whole mitogenomes.
The 13 PCGs of Monteithocoris hirsutus are 11,052 bp while the 13 PCGs of Henicocoris sp. are 11,055 bp, which is the typical order for Pentatomomorpha (Hua et al., 2008;Yuan et al., 2015;Zhao et al., 2018Zhao et al., , 2019)).All PCGs are initiated by ATN or TTG as a start codon.TAA has been assigned to the majority of the PCGs, while ND3 of Monteithocoris hirsutus and COI of Henicocoris sp.terminate with a single T residue (Tables S6 and S7).The incomplete termination codon could be presumed to be generated by the posttranscriptional polyadenylation (Ojala et al., 1981) S3) except tRNA-Ser (GCU), which lacked the dihydrouridine (DHU) arm.This phenomenon found in tRNA-Ser (GCU) has been widely reported in many other hemipterans (Zhang et al., 2018(Zhang et al., , 2019)).The rRNA genes of 16S are assumed to be between tRNA-Leu (UAG) and tRNA-Val, while 12S rRNA genes are located at tRNA-Val and control region (Figure 2).The rRNA genes of Monteithocoris hirsutus and Henicocoris sp. are 2062 and 2054 bp long, with the A + T percent of 78.47% and 78.43%, respectively, indicating a moderate AT preference.

| Phylogenetic analyses
Insect mitogenomes, especially within Hemiptera, are frequently characterized by accelerated substitution rates and compositional heterogeneity (e.g., A + T content heterogeneity), which can potentially return artifactual relationships in the higher level phylogenetic analyses (Bernt et al., 2013;Li et al., 2015;Yang et al., 2018).
Bayesian analyses using site-heterogeneous models implemented in PhyloBayes (e.g., CAT + GTR) can lessen the impact of compositional biases and alleviate related phylogenetic artifacts (e.g., long branch attraction) (Lartillot et al., 2007;Song et al., 2016;Timmermans et al., 2015).Therefore, we employ the homogene- Pentatomoidea, and Pyrrhocoroidea) were recovered as monophyletic groups in our analyses.Aradoidea was supported as the sister group to the remaining lineages, which was consistent with previous studies (Henry, 1997;Hua et al., 2008;Tian et al., 2011;Wang et al., 2016;Weirauch et al., 2019;Ye et al., 2022).The The relationships among the three superfamilies Coreoidea, Lygaeoidea, and Pyrrhocoroidea within Eutrichophora remain highly debated and contentious (Henry, 1997;Hua et al., 2008;Liu et al., 2019;Tian et al., 2011;Weirauch et al., 2019;Yang et al., 2018;Zhao et al., 2021).All of our analyses recovered Coreoidea formed the sister group to the clade of Pyrrhocoroidea + Lygaeoidea, which received strong support in Bayesian analyses using both homogeneous and heterogeneous models, but weak support in ML bootstrap analyses (Figure 6).This relationship was also supported by previous analyses using mitogenomes with Bayesian inference under site-heterogeneous models (Liu et al., 2019), mitochondrial PCGs and nuclear ribosomal genes (Wang et al., 2016), and 21 nuclear and mitochondrial genes (Li, Wang, et al., 2016).Within Lygaeoidea, our results highly supported a close relationship between Heterogastridae and Pachygronthidae, which together were placed at the basal lineage as the sister group to the remaining families of Lygaeoidea (Figure 6).Moreover, Piesmatidae was strongly supported as the sister group of Oxycarenidae, thereby confirming that Piesmatidae is a member of Lygaeoidea rather than being raised to a superfamily, as proposed by Štys (1961), Štys (1967), Štys and Kerzhner (1975), Carver et al. (1991), and Henry and Froeschner (1988).We also recovered the groups (Colobathristidae

| Divergence time estimation
The phylogenetic chronogram of Pentatomomorpha diversification based on PCG12RNArDNA matrix is shown in Figure 6.Our analyses suggested that the divergence of the infraorder Pentatomomorpha occurred 215 Ma (95% confidence interval (CI): 204-227 Ma) in the Late Triassic, which is consistent with recent studies based on multilocus (Li et al., 2017;Wang et al., 2016) and transcriptome data (Wang et al., 2019).The most recent common ancestor of

| CON CLUS IONS
In this study, two mitogenomes and complete nuclear ribosomal

ACK N OWLED G M ENTS
We are grateful to Dr. Dávid Rédei (Nankai University) and Dr.
Cuiqing Gao (Nanjing Forestry University) for identifying our sam- and stored at −20°C in the Insect Molecular Systematic Lab, Institute of Entomology, College of Life Sciences, Nankai University, Tianjin, China.Additionally, the mitogenomes of the other 22 heteropterans, coupled with the remaining 18S and 28S rDNA sequences (except for six species), were obtained from the National Center for Biotechnology Information (NCBI) database.The summary of sample information and corresponding GenBank accession numbers are listed in Table
The nucleotide skew statistics for the mitogenome of Monteithocoris hirsutus and Henicocoris sp.show that the whole genome, tRNA genes, and J-strand tRNAs are AT skewed, whereas the other PCGs, J-strand PCGs, N-strand PCGs, and rRNAs are TA skewed.The difference in AT skew between Monteithocoris hirsutus and Henicocoris sp. is Nstrand tRNAs and control region.The N-strand tRNAs and control region are AT skewed in Monteithocoris hirsutus and TA skewed in Henicocoris sp.(Figure 3).The GC skew of PCGs, N-strand PCGs, F I G U R E 2 Structure of the mitochondrial genome of Monteithocoris hirsutus (left) and Henicocoris sp.(right).PCGs are shown as purple bars, rRNA genes as blue bars, tRNA genes as green bars, and control region as yellow bars.tRNAs are named using single-letter amino acid abbreviations.TA B L E 2 Nucleotide composition and skewness of the Monteithocoris hirsutus and Henicocoris sp.mitochondrial genome.
ous model (GTR + I + Γ) in both ML and MrBayes, as well as the heterogeneous model (CAT + GTR) in PhyloBayes to analyze the data matrices of PCG12, PCG12rDNA, and PCG12RNArDNA, with the results summarized in Figure 6.All nine phylogenetic trees revealed the same topology regarding superfamily-level relationships except the position of Idiostoloidea in PCG12RNArDNA of PhyloBayes under the CAT + GTR model.The monophyly of Pentatomomorpha was strongly supported, and all six superfamilies (Aradoidea, Coreoidea, Idiostoloidea, Lygaeoidea,

F
The AT skew and GC skew of Monteithocoris hirsutus (left) and Henicocoris sp.(right).F I G U R E 4 The RSCU in the Monteithocoris hirsutus and Henicocoris sp.mitochondrial PCGs.F I G U R E 5 Amino acid composition in the Monteithocoris hirsutus and Henicocoris sp.mitogenomes.Codon families are provided on the x-axis.Numbers of codons of each amino acid are provided on the y-axis.superfamilies of Trichophora were retrieved as (Pentatomoidea + (Idiostoloidea + (Coreoidea + (Pyrrhocoroidea + Lygaeoidea)))), but Idiostoloidea was not supported as the sister group of Eutrichophora in PCG12RNArDNA of PhyloBayes under the CAT + GTR model.Our results of the placement of Idiostoloidea were congruent with the results of Ye et al. (2022), which were based on nuclear and mitochondrial PCGs and rRNA genes, but the mitogenome data of Idiostoloidea are limited in their study (i.e., they only have two partial PCGs represent for Henicocoridae).

+
(Berytidae + (Malcidae + (Geocoridae + Rhyparochromidae)))), but without strong support from either ML or PhyloBayes under the CAT + GTR model.However, our principal goal in this study was to investigate the phylogenetic position of Idiostoloidea and the evolutionary history of Pentatomomorpha.As we did not sample all the families of Eutrichophora and Lygaeoidea, a more thorough sampling of taxa is required to adequately resolve the higher level relationships within Eutrichophora and Lygaeoidea, respectively.F I G U R E 6 Phylogenetic chronogram of Pentatomomorpha based on mitochondrial genomes and nrDNAs (PCG12RNArDNA) by using BEAST.Red bars denote the calibration points, and blue bars indicate 95% mean confidence intervals.Colored cubes at the nodes indicate the support values (from top to bottom rows: ML support values, Bayesian posterior probabilities (BPP) of MrBayes, BPP of PhyloBayes) for different data matrices (from left to right columns: PCG12, PCG12rDNA, PCG12RNArDNA), respectively.Asterisks indicate new mitochondrial genomes sequenced in this study.The geological time scale is shown at the bottom.
sequences of Idiostoloidea and four mitogenomes of Lygaeoidea are reported here for the first time.Coupled with published data, phylogenetic analyses and divergence time were conducted in Pentatomomorpha.Our results confirm the monophyletic groups of each superfamily, with the following phylogenetic relationships: (Aradoidea + (Pentatomoidea + (Idiostoloidea + (Coreoidea + (Pyrrhocoroidea + Lygaeoidea))))).Furthermore, estimated divergence times revealed that most pentatomomorphan superfamilies and families diverged during the Late Jurassic to Early Cretaceous, which coincides with the explosive radiation of angiosperms.In comparison of mitogenomes, relatively little variation was observed in the length of PCGs, tRNAs, and rRNAs.The gene order matches very well with the ancestral gene arrangement of Drosophila yakuba.The AT content was significantly higher than the GC content.Our findings reveal the relationships among superfamilies, and more mitogenomes and nuclear genes should be sequenced to comprehensively understand the mitogenomic evolution and phylogenetic relationships of Pentatomomorpha.Data curation (equal); formal analysis (equal); investigation (equal); visualization (equal); writing -original draft (lead).XiaoYan Chen: Data curation (equal); investigation (equal).Jingjing Yang: Methodology (equal); software (equal).Wenbo Yi: Data curation (equal); methodology (equal).Qiang Xie: Methodology (equal).HuanHuan Yang: Formal analysis (equal).Merrill H. Sweet: Resources (equal).Wenjun Bu: Resources (equal); supervision (equal); writingreview and editing (equal).Teng Li: Resources (equal); supervision (equal); writing -review and editing (equal).
ples of Monteithocoris hirsutus, Henicocoris sp., Parapiesma salsolae, Heterogaster chinensis, Pachygrontha antennata, and Oxycarenus pallens.This research was supported by Natural Science Foundation of Shanxi Province (No. 202103021223318), National Natural Science Foundation of China (No. 32300377, 32130014, 31820103013, 31501879, 32000318) and Natural Science Foundation of Shandong Province (No. ZR2020QC054), and Innovation and entrepreneurship training program for college students (CXCY2349).Open access publishing facilitated by The University of Auckland, as part of the Wiley -The University of Auckland agreement via the Council of Australian University Librarians.